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Convex optimization problem prototyping for 
image reconstruction in computed tomography 
with the Chambolle-Pock algorithm 

Emil Y. Sidky, Jakob H. J0rgensen, and Xiaochuan Pan 

Abstract 

The primal-dual optimization algorithm developed in Chambolle and Pock (CP), 201 1 is applied to various convex 
optimization problems of interest in computed tomography (CT) image reconstruction. This algorithm allows for rapid 
prototyping of optimization problems for the purpose of designing iterative image reconstruction algorithms for CT. 
The primal-dual algorithm is briefly summarized in the article, and its potential for prototyping is demonstrated by 
explicitly deriving CP algorithm instances for many optimization problems relevant to CT. An example application 
modeling breast CT with low-intensity X-ray illumination is presented. 

I. Introduction 

Optimization-based image reconstruction algorithms for CT have been investigated heavily recently due to their 
potential to allow for reduced scanning effort while maintaining or improving image quality iTfl. El. Such methods 
have been considered for many years, but within the past five years computational barriers have been lowered 
enough such that iterative image reconstruction can be considered for practical application in CT [3|. The transition 
to practice has been taking place alongside further theoretical development particularly with algorithms based on 
the sparsity-motivated ^-norm 0], 0, (6), Q, ©, 0, ED, (Til, 03- Despite the recent interest in sparsity, 
optimization-based image reconstruction algorithm development continues to proceed along many fronts and there 
is as of yet no consensus on a particular optimization problem for the CT system. In fact, it is beginning to look 
like the optimization problem, upon which the iterative image reconstruction algorithms are based, will themselves 
be subject to design depending on the particular properties of each scanner type and imaging task. 

Considering the possibility of tailoring optimization problems to a class of CT scanners, makes design of iterative 
image reconstruction algorithms a daunting task. Optimization formulations generally construct an objective function 
comprised of a data fidelity term and possible penalty terms discouraging unphysical behavior in the reconstructed 
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image, and they possibly include hard constraints on the image. The image estimate is arrived at by extremizing 
the objective subject to any constraints placed on the estimate. The optimization problems for image reconstruction 
can take many forms depending on image representation, projection model, and objective and constraint design. On 
top of this, it is difficult to solve many of the optimization problems of interest. A change in optimization problem 
formulation can mean many weeks or months of algorithm development to account for the modification. 

Due to this complexity, it would be quite desirable to have an algorithmic tool to facilitate design of optimization 
problems for CT image reconstruction. This tool would consist of a well-defined set of mechanical steps that 
generate a convergent algorithm from a specific optimization problem for CT image reconstruction. The goal of this 
tool would be to allow for rapid prototyping of various optimization formulations; one could design the optimization 
problem free of any restrictions imposed by a lack of an algorithm to solve it. The resulting algorithm might not 
be the most efficient solver for the particular optimization problem, but it would be guaranteed to give the answer. 

In this article we consider convex optimization problems for CT image reconstruction, including non-smooth 
objectives, unconstrained and constrained formulations. One general algorithmic tool is to use steepest descent or 
projected steepest descent [13|. Such algorithms, however, do not address non-smooth objective functions and they 
have difficulty with constrained optimization, being applicable for only simple constraints such as non-negativity. 
Another general strategy involves some form of evolving quadratic approximation to the objective. The literature 
on this flavor of algorithm design is enormous, including non-linear conjugate gradient methods [13|, parabolic 
surrogates [10], [14], and iteratively reweighted least-squares lfl5l , For the CT system these strategies often require 
quite a bit of know-how due to the very large scale and ill-posedness of the imaging model. Once the optimization 
formulation is established, however, these quadratic methods provide a good option to gain in efficiency. 

One of the main barriers to prototyping alternative optimization problems for CT image reconstruction is the size 
of the imaging model; volumes can contain millions of voxels and the sinogram data can correspondingly consist of 
millions of X-ray transmission measurements. For large-scale systems there has been some resurgence of first-order 
methods lfl6l . |T7l . Ifl8l . [19], l20l . ETIl and recently there has been applications of first-order methods specifically 
for optimization-based image reconstruction in CT ETI . H22H . |23l . These methods are interesting because they 
can be adapted to a wide range of optimization problems involving non-smooth functions such as those involving 
£i-based norms. In particular, the algorithm that we pursue further in this paper is a first-order primal-dual algorithm 
for convex problems by Chambolle and Pock |20l . This algorithm goes a long way toward the goal of optimization 
problem prototyping, because it covers a very general class of optimization problems that contain many optimization 
formulations of interest to the CT community. 

For a selection of optimization problems of relevance to CT image reconstruction, we work through the details 
of setting up the Chambolle-Pock algorithm. We refer to these dedicated algorithms as algorithm instances. Our 
numerical results demonstrate that the algorithm instances achieve the solution of difficult convex optimization prob- 
lems under challenging conditions in reasonable time and without parameter tuning. In Sec. [II] the CP methodology 



and algorithm is summarized; in Sec. Ill various optimization problems for CT image reconstruction are presented 



along with their corresponding CP algorithm instance; and Sec. IV shows a limited study on a breast CT simulation 
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that demonstrates the application of the derived CP algorithm instances. 

II. Summary of the generic Chambolle-Pock algorithm 

The Chambolle-Pock (CP) algorithm 11201 is primal-dual meaning that it solves an optimization problem simulta- 
neously with its dual. On the face of it, it would seem to involve extra work by solving two problems instead of one, 
but the algorithm comes with convergence guarantees and solving both problems provides a robust, non-heuristic 
convergence check - the duality gap. 

The CP algorithm applies to a general form of the primal minimization: 

min {F(Kx) + G{x)} , (1) 

X 

and a dual maximization: 

max {-F*(y)-G*(-K T y)}, (2) 
y 

where x and y are finite dimensional vectors in the spaces X and Y, respectively; K is a linear transform from X 
to Y; G and F are convex, possibly non-smooth, functions mapping the respective X and Y spaces to non-negative 
real numbers; and the superscript "*" in the dual maximization problem refers to convex conjugation, defined in 
Eqs. ([3]) and (|4]). We note that the matrix K need not be square; X and Y will in general have different dimension. 
Given a convex function H of a vector z € Z, its conjugate can be computed by the Legendre transform B24L and 
the original function can be recovered by applying conjugation again: 

ff*(z) = max {{z,z') z -H{z')}, (3) 

z' 

H(z') = max {{z', z) z - H*(z)} . (4) 

z 

The notation (-,-)z refers to the inner product in the vector space Z. 

Formally, the primal and dual problems are connected in a generic saddle point optimization problem: 

minmax {(Kx,y) Y + G(x) -F*(y)}. (5) 

x y 

By performing the maximization over y in Eq. {5}, using Eq. Q with Kx associated with y' , the primal minimization 
Eq. ([!]) is derived. Similarly, performing the minimization over x in Eq. |5]), using Eq. <j3j and the identity {Kx, y) = 
(x,K T y), yields the dual maximization Eq. (EJ, where the T superscript denotes matrix transposition. 

The minimization problem in Eq. ([TJ, though compact, covers many minimization problems of interest to 
tomographic image reconstruction. Solving the dual problem, Eq. simultaneously allows for assessment of 
algorithm convergence. For intermediate estimates x and y of the primal minimization and the dual maximization, 
respectively, the primal objective will be greater than or equal to the dual objective. The difference between these 
objectives is referred to as the duality gap, and convergence is achieved when this gap is zero. Plenty of examples 
of useful optimization problems for tomographic image reconstruction will be described in detail in Sec. [TTTJ but 
first we summarize Algorithm 1 from Ref. |20|. 
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Algorithm 1 Pseudocode for iV-steps of the basic Chambolle-Pock algorithm. The constant L is the ^2-norm of the 
matrix K; t and a are non-negative CP algorithm parameters, which are both set to 1/L in the present application; 
6 € [0, 1] is another CP algorithm parameter, which is set to 1; and n is the iteration index. The proximal operators 



1: L <r- \\K\\ 2 ; t <r- 1/L; a «- 1/L; 6 <- 1; n <- 

2: initialize xq and yo to zero values 

3: Xq <— So 

4: repeat 

5: <- prox a [F*](y n + aKx n ) 

6: x n+1 «- prox T [G](x n - TK T y n+1 ) 

1: X n+ i <- X n+ i + 0(x n+ i - X n ) 

8: n <— n + 1 

9: until n> N 



A. Chambolle-Pock: Algorithm 1 

The CP algorithm simultaneously solves Eqs. (JTJ and As presented in Ref. [20] the algorithm is simple, 
yet extremely effective. We repeat the steps here in Listing [T] for completeness, providing the parameters that we 
use for all results shown below. The parameter descriptions are provided in Ref. [20 1, but note that in our usage 
specified above there are no free parameters. This is an extremely important feature for our purpose of optimization 
prototyping. One caveat is that technically the proof of convergence for the CP algorithm assumes L 2 ctt < 1, but in 
practice we have never encountered a case where the choice a = r = 1/L failed to tend to convergence. We stress 
that in Eq. ^ the matrix K T needs to be the transpose of the matrix K\ this point can sometimes be confusing 
because K for imaging applications is often intended to be an approximation to some continuous operator such as 
projection or differentiation and often K T is taken to mean the approximation to the continuous operator's adjoint, 
which may or may not be the matrix transpose of K. The constant L is the magnitude of the matrix K, its largest 
singular value. Appendix |A] gives the details on computing L via the power method. Key to deriving the particular 
algorithm instances are the proximal mappings prox a [F*] and prox T [G) (called resolvent operators in Ref. |20|). 

The proximal mapping is used to generate a descent direction for the convex function H and it is obtained by 
the following minimization: 



This operation does admit non-smooth convex functions, but H does need to be simple enough that the above 
minimization can be solved in closed form. For CT applications the ability to handle non-smooth F and G allows 
the study of many optimization problems of recent interest, and the simplicity limitation is not that restrictive as 
will be seen. 



prox a and prox T are defined in Eq. ([6]). 




(6) 
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B. The CP algorithm for prototyping of convex optimization problems 

To prototype a particular convex optimization problem for CT image reconstruction with the CP algorithm, there 
are five basic steps: 

(1) Map the optimization problem to the generic minimization problem in Eq. ([TJ. 

(2) Derive the dual maximization problem, Eq. (j2j), by computing the convex conjugates of F and G using the 
Legendre transform Eq. ([3]). 

(3) Derive the proximal mappings of F and G using Eq. |6]). 

(4) Substitute the results of (3) into the generic CP algorithm in Listing [T] to obtain a CP algorithm instance. 

(5) Run the algorithm, monitoring the primal-dual gap for convergence. 

As will be seen below, a great variety of constrained and unconstrained optimization problems can be written in 
the form of Eq. ([T}. Specifically, using the algebra of convex functions E4l . that the sum of two convex functions is 
convex and that the composition of a convex function with a linear transform is a convex function, many interesting 
optimization formulations can be put in the form of Eq. ([T]). We will also make use of convex functions which are 
not smooth - notably l\ based norms and indicator functions 5g(x): 

{0 xeS 
(7) 
oo x S 

where S is a convex set. The indicator function is particularly handy for imposing constraints. In computing the 
convex conjugate and proximal mapping of convex functions, we make much use of the standard calculus rule 
for extremization, V/ = 0, but such computations are augmented also with geometric reasoning, which may be 
unfamiliar. Accordingly, we have included appendices to show some of these computation steps. With this quick 
introduction, we are now in a position to derive various algorithm instances for CT image reconstruction from 
different convex optimization problems. 

III. Chambolle-Pock algorithm instances for CT 

For this article, we only consider optimization problems involving the linear imaging model for X-ray projec- 
tion, where the data are considered as line integrals over the object's X-ray attenuation coefficient. Generically, 
maintaining consistent notation with Ref. l20l , the discrete-to-discrete CT system model |j25l can be written as: 

Au = g, (8) 

where A is the projection matrix taking an object represented by expansion coefficients u and generating a set of 
line-integration values g. This model covers a multitude of expansion functions and CT configurations, including 
both 2D fan-beam and 3D cone-beam projection data models. 

A few notes on notation are in order. In the following, we largely avoid indexing of the various vector spaces 
in order that the equations and pseudocode listings are brief and clear. Any of the standard algebraic operations 
between vectors is to be interpreted in a component-wise manner unless explicitly stated. Also, an algebraic operation 
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between a scalar and a vector is to be distributed among all components of the vector; e.g., 1 + v adds one to all 
components of v. For the optimization problems below, we employ three vector spaces: / the space of discrete 
images in either 2 or 3 dimensions; D the space of the CT sinograms (or projection data); and V the space of 
spatial-vector-valued image arrays, V = I d where d = 2 or 3 for 2D and 3D-space, respectively. For the CT 
system model Eq. ([HJ, u E I, and g E D, but we note that the space D can also include sinograms which are not 
consistent with the linear system matrix A. The vector space V will be used below for forming the total variation 
(TV) semi-norm; an example of such a vector v E V is the spatial-gradient of an image u. Although the pixel 
representation is used, much of the following can be applied to other image expansion functions. As we will be 
making much use of certain indicator functions, we define two important sets, Box(a) and Ball(a), through their 
indicator function: 

|o ||as||oo<a 

<W(a)W = < (9) 

and 

|0 ||x||a<a 

S B all(a)(x) = I ■ (10) 

I oo ||x||2 > a 

Recall that the || • norm selects the largest component of the argument, thus Box(a) comprises vectors with no 
component larger than a (in 2D Box (a) is a square centered on the origin with width 2a). We also employ Ox 
and l x to mean a vector from the space X with all components set to and 1, respectively. 

A. Image reconstruction by least-squares 

Perhaps the simplest optimization method for performing image reconstruction is to minimize the the quadratic 
data error function. We present this familiar case in order to gain some experience with the mechanics of deriving 
CP algorithm instances, and because the quadratic data error term will play a role in other optimization problems 
below. The primal problem of interest is: 

min \\\Au-g\\ 2 2 . (11) 
To derive the CP algorithm instance, we make the following mechanical associations with the primal problem Eq. 

F{y) = \\\y-g\\l (12) 

G{x) = 0, (13) 
x = u, y = Au (14) 
K = A. (15) 
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Applying Eq. [3] we obtain the convex conjugates of F and G: 

F*(P) = \\\P\\1 + {V,9)D, 

G*{q)=8 0l {q), (16) 

where p E D and q E I. While obtaining F* in this case involves elementary calculus for extremization of Eq. <|3j, 
finding G* needs some comment for those unfamiliar with convex analysis. Using the definition of the Legendre 
transform for G(x) = 0, we have: 

G*(q) = max.(q,x)i. (17) 

X 

There are two possibilities: (1) q = Oj, in which case the maximum value of (q, x)i is 0, and (2) q ^ 0/, in which 
case this inner product can increase without bound, resulting in a maximum value of oo. Putting these two cases 
together yields the indicator function in Eq. ( ]16[ l. With F, G, and their conjugates, the optimization problem dual 
to Eq. (Ill can be written down from Eq. (|2]): 

max ^-l\\p\\l-{ p ,g} D -S 0l (-A T p)\. (18) 

For deriving the CP algorithm instance, it is not strictly necessary to have this dual problem, but it is useful for 
evaluating convergence. 



The CP algorithm solves Eqs. (Ill and (18i simultaneously. In principle, the values of the primal and dual 
objective functions provide a test of convergence. During the iteration the objective of the primal problem will by 
greater than the objective of the dual problem, and when the solutions of the respective problems are reached, these 
objectives will be equal. Comparing the duality gap, i.e. the difference between the primal objective and the dual 
objective, with thus provides a test of convergence. The presence of the indicator function in the dual problem, 
however, complicates this test. Due to the negative sign in front of the indicator, when the argument is not the zero 
vector, this term and therefore the whole dual objective is assigned to a value of — oo. The dual objective achieves 
a finite, testable value only when the indicator function attains the value of 0, when A T p = 0/. Effectively, the 
indicator function becomes a way to write down a constraint in the form of a convex function, in this case an 
equality constraint. The dual optimization problem can thus alternately be written as a conventional constrained 
maximization: 

max - (p,.9>/j| such that A T p = 0j. (19) 

The convergence check is a bit problematic, because the equality constraint will not likely be strictly satisfied in 
numerical computation. Instead, we introduce a conditional primal-dual gap (the difference between the primal and 
dual objectives ignoring the indicator function) given the estimates v! and p'\ 

cPD(u',p') = \\Au' -g\\\+ l -\\p'\\l + (p',g) D , (20) 

and separately monitor A T p' to see if it is tending to 0/. Note that the conditional primal-dual gap need not be 
positive, but it should tend to zero. 
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To finally attain the CP algorithm instance for image reconstruction by least-squares, we derive lines [5] and [6] i 
Alg. [T] The proximal mapping prox a [F*](y), y € D, for this problem results from a quadratic minimization: 



in 



prox a [F*](y) = argmin 

y' 




} 



(21) 



y - o\9 



~ 1 + a ' 

and as G(x) = 0, x € /, the corresponding proximal mapping is 



procc T [G](a;) = .t. 



(22) 



Substituting in the arguments from the generic algorithm, leads to the update steps in Listing [2] The constant 



Algorithm 2 Pseudocode for TV-steps of the least-squares Chambolle-Pock algorithm instance. 



1: 


L <- ||A|| 2 ; t <- ct <- 1/L; <- 1 


; n <- 


2: 


initialize uq and po to zero values 




3: 


u <- u 




4: 


repeat 




5: 


Pn+l <~ (Pn + o{Au n - g))/ (1 + cr) 




6: 


U n+ i <- U„ - TA T p„ + i 




7: 
8: 


u„+i m„+i + 6»(u„ + i - u n ) 
n <— n + 1 




9: 


until 7i > TV 





L = \\A\\2 is the largest singular value of A (see Appendix [A] for details on the power method). Crucial to the 
implementation of the CP algorithm instance is that A T be the exact transpose of A, which is a non-trivial matter 
for tomographic applications, because the projection matrix A is usually computed on-the-fly f26j, [27 1, [28|. 
Convergence of the CP algorithm is only guaranteed when A T is the exact transpose of A, although it may be 
possible to extend the CP algorithm to mismatched projector/back-projector pairs by employing the analysis in Ref. 



This derivation of the CP least-squares algorithm instance illustrates the method on a familiar optimization 
problem, and it provides a point of comparison with standard algorithms; this quadratic minimization problem can 
be solved straight-forwardly with the basic, linear conjugate gradients (CG) algorithm. Another important point 
for this particular algorithm instance, where limited projection data can lead to an underdetermined system, is that 
the CP algorithm will yield a minimizer of the objective \\Au — g\\\ which depends on the initial image uq. In 
this case, it is recommended to take advantage of the prototyping capability of the CP framework to augment 
the optimization problem so that it selects a unique image independent of initialization. For example, one often 
seeks an image closest to either 0/ or a prior image, which can be formulated by adding a quadratic term or 
|| u — UpHor || 2 w i m a small combination coefficient. 



[29| 
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1 j Adding in non-negativity constraints: One of the flexibilities of the CP method becomes apparent in adding 
bound constraints. While CG is also flexible tool for dealing with large and small quadratic optimization, modifi- 
cation to include constraints, such as non-negativity, considerably complicates the CG algorithm. For CP, adding 
in bound constraints is simply a matter of introducing the appropriate indicator function into the primal problem: 

nun ji||i4u- ff ||2 + j, (23) 

where the set P is all u with non-negative components. Again, we make the mechanical associations with the 
primal problem Eq. (QJ: 

F{y) = \\\y-9\\l (24) 

G{x) = 5 P (x), (25) 



x = u, y = Kx, (26) 
K = A. (27) 

The difference from the unconstrained problem is the function G(x). It turns out that the convex conjugate of 5p(x) 
is: 

5* P (x) = 5 P (-x), (28) 

see Appendix |B] for insight on convex conjugate of indicator functions. Straight substitution of G* and F* into Eq. 
yields the dual problem: 

max |-i||p||l - (p,g) D -5p(A T p)} . (29) 

As a result the conditional primal-dual gap is the same as before. The difference now is that the constraint checks 
are that A T p and u should be non-negative. 

To derive the algorithm instance, we need the proximal mapping prox T [G], which by definition is: 

prox T [5 P ]{x) = argmin is P (x') + j . (30) 

The indicator in the objective prevents consideration of negative components of x'. The £2 term can be regarded 
as a sum over the square difference between components of x and x'; thus the objective is separable and can be 
minimized by constructing x' such that x\ — Xi when x.- L > and x\ = when x\ < 0. Thus this proximal mapping 
becomes a non-negativity thresholding on each component of x: 

{Xi Xi > 
(31) 
x t < 

Substituting into the generic pseudocode yields Listing [5] Again, we have L = \\A\\ 2 - The indicator function Sp 
leads to the intuitive modification that non-negativity thresholding is introduced in line [6] of Listing [3] In this case 
the non-negativity constraint in u will be automatically satisfied by all iterates u n . Upper bound constraints are 
equally simple to include. 
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Algorithm 3 Pseudocode for TV-steps of the least-squares with non-negativity constraint, CP algorithm instance. 

l: L <r- \\A\\ 2 ; t <r- l/L; a 1/L; 9 <- 1; n <- 

2: initialize uo and po to zero values 

3: Uo Uq 

4: repeat 

5: p n+1 «- (p n + cr(Au„ - #))/ (1 + cr) 
6: w„+i «- pos(u n - rA T p n+1 ) 
+ 0(u n+1 - U n ) 

8: n <- n + 1 
9: until n> N 



B. Optimization problems based on the Total Variation (TV) semi-norm 

Optimization problems with the TV semi-norm have received much attention for CT image reconstruction lately 
because of their potential to provide high quality images from sparse view sampling JS), BUI . 112211 . (9|, |[3~T1 . 
(32), 3 3 1 . The TV semi-norm has been known to be useful for performing edge-preserving regularization, and 
recent developments in compressive sensing have sparked even greater interest in the use of this semi-norm. 
Algorithm-wise the TV semi-norm is difficult to handle. Although it is convex, it is not linear, quadratic or even 
everywhere-differentiable, and the lack of differentiability precludes the use of standard gradient-based optimization 
algorithms. In this sub-section we go through, in detail, the derivation of a CP algorithm instance for a TV- 
regularized least squares data error norm. We then consider the Kullback-Leibler (KL) data divergence, which is 
implicitly employed by many iterative algorithms based on maximum likelihood expectation maximization (MLEM). 
We also consider a data error norm based on li which can have some advantage in reducing the impact of image 
discretization error, which generally leads to a highly non-uniform error in the data domain. Finally, we derive a CP 
algorithm instance for constrained TV-minimization, which is mathematically equivalent to the least-squares-plus- 
TV problem J34), but whose data-error constraint parameter has more physical meaning than the parameter used 
in the corresponding unconstrained minimization. While the previous CP instances solve optimization problems, 
which can be solved efficiently by well-known algorithms, the following CP instances are new for the application 
of CT image reconstruction. 

The optimization problem of interest is 



where the last term, the ^i-norm of the gradient-magnitude image, is the isotropic TV semi-norm. The spatial -vector 
image Vu represents a discrete approximation to the image gradient which is in the vector space V, i.e., the space 
of spatial-vector-valued image arrays. The expression |Vit| is the gradient-magnitude image, an image array whose 
pixel values are the gradient magnitude at the pixel location. Thus, Vu G V and |Vtt| € /. Because V is defined in 
terms of finite differencing, it is a linear transform from an image array to a vector-valued image array, the precise 




(32) 
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form of which is covered in Appendix [D] This problem was not explicitly covered in Ref. 11201 . and we fill in the 
details here. For this case, matching the primal problem to Eq. ([T| is not as obvious as the previous examples. We 
recognize in Eq. ( [32| that both terms involve a linear transform, thus the whole objective function can be written 
in the form F(Kx) with the following assignments: 

F(y,z)=F 1 (y)+F 2 (z), F^y) = ~\\y - gg, F 2 (z) = A IKi^Dlli , (33) 
G(x) = 0, (34) 



x = u, y = Au, z = Vu, (35) 
K =(£), (36) 



where u € I, y € D, and z £ V . Note that F(y, z) is convex because it is the sum of two convex functions. 
Also the linear transform K takes an image vector x and gives a data vector y and an image gradient vector z. 
The transpose of K, K T — (A T ', —div), will produce an image vector from a data vector y and an image gradient 
vector z: 

x <— A T y — div z, (37) 

where we use the same convention as in Ref. l20l that —div = V T , see Appendix |d| 

In order to get the convex conjugate of F we need F 2 *. For readers unfamiliar with the Legendre transform of 
indicator functions Appendix [B] illustrates the transform of some common cases. By definition, 

F 2 *( g )=max {(q,z) v -X\\(\z\)\\ 1 }, (38) 

z 

where q £ V, like z, is a vector-valued image array. There are two cases to consider: (1) the magnitude image \q\ 
at all pixels is less than or equal to A, i.e. \q\ £ Box(X) and (2) the magnitude image \q\ has at least one pixel 



greater than A, i.e. \q\ Box(X). It turns out that for the former case the maximization in Eq. (38 i yields 0, while 
the latter cause yields oo. Putting these two cases together, we have 

F;(q) = 6 BoxW (\q\). (39) 

The conjugates of F and G are: 

F*( P ,q) = l\\p\\t+ (p,g)o + S Box(x) (\q\), (40) 
G*(r)=S 0l (r), (41) 

where p £ D, q £ V, and r £ I. 



The problem dual to Eq. ( |32] i becomes: 

1 



max 

p, i 



2 lbll2 - &9)d -<W(A)(M) -S 0l (divq- A p)\. (42) 



The resulting conditional primal-dual gap is 



cPD(u',p',q') = \\\Au' - g\\l + A HdV^DH, + l -\\p'\\l + (p',g) P (43) 
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with additional constraints \q'\ € Box(X) and A T p' — div q' = 0/. The final piece needed for putting together the 



CP algorithm instance for Eq. (32i is the proximal mapping: 

prox a [F*](y, z) = 7^^) • (44) 

\ 1 + a max(Alj, \z\) J 

The proximal mapping of the data term was covered previously, and that of the TV term is explained in Appendix 
|c| With the necessary pieces in place, the CP algorithm instance for the objective can be written down in 

Listing |4] Line [6] and the corresponding expression in Eq. (44 1, require some explanation, because the division 



Algorithm 4 Pseudocode for A-steps of the l\-TV CP algorithm instance. 

l: L <- \\(A, V)|| 2 ; r -s- l/L; a <- l/L; 9 1; n -s- 

2: initialize u , Pq, and q to zero values 

3: Uq <— Uq 

4: repeat 

5: p n+1 <- (p n + cr(Au n - £?))/ (1 + cr) 

6: g n +i <~ A(g„ + crVu„)/ max(Al/, |g„ + crVu n |) 
7: u n+ i <-ii„- rA T p n+ i + rdiu g n+1 
8: U n+1 4- U n+1 + 9(u n+1 ^nj 

9: n <- n + 1 

10: until n> N 



operation is non-standard as the numerator is in V and the denominator is in /. The effect of this line is to threshold 
the magnitude of the spatial-vectors at each pixel in q n + aWu n to the value A: spatial-vectors larger than A have 
their magnitude rescaled to A. The resulting thresholded, spatial-vector image is then assigned to g n +i. Recall that 
1/ at line [6] is an image with all pixels set to 1. The operator | • | in this line converts a vector- valued image in V 
to a magnitude image in /, and the max(Al/, •) operation thresholds the lower bound of the magnitude image to 
A pixel-wise. Operationally, the division is performed by dividing the spatial-vector at each pixel of the numerator 
by the scalar in the corresponding pixel of the denominator. Another potential source of confusion is computing 
the magnitude \\(A, V) || 2 ■ The power method for doing this is covered explicitly in Appendix [A| If it is desired to 
enforce the positivity constraint, the indicator 6p(u) can be added to the primal objective, and the effect of this is 
indicator is the same as for Listing |3j namely the right hand side of line [7] goes inside the pos(-) operator. 

1) Alternate data divergences: For a number of reasons motivated by the physical model of imaging systems, it 
may be of use to formulate optimization problems for CT image reconstruction with alternate data-error terms. A 
natural extension of the quadratic data divergence is to include a diagonal weighting matrix. The corresponding CP 
algorithm instance can be easily derived following the steps mentioned above. As pointed out above, the CP method 
is not limited to quadratic objective functions and other important convex functions can be used. We derive, here, 
three additional CP algorithm instances. For alternate data divergences we consider the oft-used KL divergence, 
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and one not so commonly used, l\ data-error norm. For the following, we need only analyze the function Fx, as 
everything else remains the same as for the objective in Eq. (32i. 



TV plus KL data divergence: One data divergence of particular interest for tomographic image reconstruction 
is KL. Objectives based on KL are what is being optimized in the various forms of MLEM, and it is used often 
when data noise is a significant physical factor and the data are modeled as being drawn from a multivariate Poisson 
probability distribution [25). For the situation where the view-sampling is also sparse, it might be of interest to 
combine a KL data error term with the TV semi-norm in the following primal optimization: 

mm |^[Au-. 9 + 5 ln 5 - 5 ln(pos(v4u))] 4 + (5p(AM) + A||(|Vu|)|| 1 | , (45) 

where ^J-]i performs summation over all components of the vector argument. This example proceeds as above 
except that the Fx function is different: 

F i(y) = £ \y -9 + .9 in .9 - 9Hp° s iy))}i + My) ( 46 ) 

i 

where y 6 D, and the function In operates on the components of its argument. Use of the KL data divergence 
makes sense only with positive linear systems A and non-negative pixel values u and data g. However, by defining 
the function over the whole space and using an indicator function to restrict the domain [24], a wide variety of 
optimization problems can be treated in a uniform manner. Accordingly, dp is introduced into the Fx objective and 
the pos operator is used just so that this objective is defined in the real numbers. The derivation of F*, though 
mechanical, is a little bit too long to be included here. We simply state the resulting conjugate function: 

Fi(p) = E [-gfopo'Q-D -P)]i + S P {1 D -P). (47) 

i 

The resulting dual problem to Eq. (|45| is thus: 



mE Jf |5Z [ff to pos (Id -p)]i - 5 P (1 D -p)- 8 Bo x(\)(\q\) - S 0l (divq- A T p) j . (48) 

To form the algorithm instance, we need the proximal mapping prox a [Fx\{y) 

prox a [F*}(y) = l(l D+y - ^ (y - l D f + 4a 9 ) . (49) 

An interesting point in the derivation, shown partially in Appendix [c] of prox a [Fi](y) is that the quadratic equation 
is needed, and the support function in F*(p) is used to select the correct (in this case negative) root of the 
discriminant in the quadratic formula. With the new function Fx, its conjugate, and the conjugate's proximal 
mapping, we can write down the CP algorithm instance. Listing [5] gives the CP algorithm instance minimizing 
a KL plus TV semi-norm objective. The difference between this algorithm instance and the previous case 
comes only at the update at line [5] This algorithm instance has the interesting property that the intermediate image 
estimates u n can have negative values even though the converged solution will be non-negative. If it is desirable to 
have the intermediate image estimates be non-negative, the non-negativity constraint can be easily introduced by 
adding the indicator 5p(u) to the primal objective, resulting in the addition of the pos(-) operator at line [7] as was 
shown in Listing [3] 
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Algorithm 5 Pseudocode for TV-steps of the KL-TV CP algorithm instance. 



1: 


L 4r- \\(A, V)|| a ; t <- 1/i; <r <- 1/i; 4 


- 1; n <- 




2: 


initialize uq, po, and go to zero values 






3: 


u <- u 






4: 


repeat 






5: 


p n+ l «- ^ ( 1_D + Pn + crAf2„ - ^ (p n - 


1- ctAm„ - l£>) 2 




6: 


q n +i <- A(g n + crVw„)/ max(Al/, |g„ - 


1- ctVm„ ) 




7: 
8: 


"n+l <- «n - T^ T Pn4-l + T(liv q n+1 
U n +1 *~ U n+ i + 6»(u„ + i - M„) 






9: 


n n + 1 






10: 


until n > N 







7Y p/Mi data-error norm: The combination of TV semi-norm regularization and l\ data-error norm has 
been proposed for image denoising and it has some interesting properties for that purpose ll35l . This objective is 
also presented in Ref. |20|. For tomography, this combination may be of interest because the l\ data-error term is an 
example of a robust fit to the data. The idea of robust approximation is to weakly penalize data that are outliers |36|. 
Fitting with the commonly used quadratic error function, clearly puts heavy weight on outlying measurements which 
in some situations can lead to streak artifacts in the images. In particular, for tomographic image reconstruction 
with a pixel basis, discretization error and metal objects can lead to highly non-uniform error in the data model. Use 
of the l\ data-error term may allow for large errors for measurements along the tangent rays to internal structures, 
where discretization can have a large effect. The l\ data-error term also puts greater emphasis on fitting the data 
that lie close to the model. The primal problem of interest is: 



min {\\Au - g\\i + X IKlVwDHJ . 



(50) 



For this objective, the function F\ is: 



F 1 {y) = \\y-g\\ l . 



(51) 



Computing the convex conjugate Fj* yields 



F*(p) = 5 Bo x(i){p) + (p,9)d, 



(52) 



and the resulting dual problem is: 



max {-S Bo x(i)(p) - (p,9)d - S Box {X)(\q\) ~ 6 0l (divq - A T 



P)}- 



(53) 



The proximal mapping necessary for completing the algorithm instance is 



prox a [F*](y) 



y - go 



(54) 



Tasx.{l D Ay-ga\) 
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where ljj is a data array with each component set to one and the max operation is performed component-wise. 
The corresponding pseudo-code for minimizing Eq. ( 50 1 is given in Listing [6] where the only difference between 
this code and the previous two occurs at line [5] The ability to deal with non-smooth objectives uncomplicates this 



Algorithm 6 Pseudocode for TV-steps of the ^i-TV CP algorithm instance. 



1 . 
1 . 


T i \\( A V7M|„. 
^ ^ \\\ A i V )\\2, 


T J— 1 IT ■ rr 1 1 / T • ft I 1 ■ n I D 


L. 


UllLlallZc (iQ, Pq, 


on/i it. Torn 1 70 lnoc 

dllU t/g LU ZclO VdlUcS 


3: 


u i- Uq 




4: 


repeat 




5: 


p n+ i <- (p n 4 


- a(Au n - g))/ max(l D , \p n + a(Au n - g)\) 


6: 


q n +i <- X(q n 


4 erV£t„)/ max(Al/, \q n + aVu n \) 


7: 


u n +i *- u„ - 


TA T p n+1 + rdivq n+ i 


8: 


u n+ i <- u n+1 


+ 0(u n+1 - u n ) 


9: 


n 4— n + 1 




10: 


until n > N 





particular problem substantially. If smoothness were required, there would have to be smoothing parameters on both 
the l\ and TV terms, adding two more parameters than necessary to a study of the image properties as a function 
of the optimization-problem parameters. 

2) Constrained, TV-minimization: The previous three optimization problems combine a data fidelity term with a 
TV-penalty, and the balance of the two terms is controlled by the parameter A. An inconvenience of such optimization 
problems is that it is difficult to physically interpret A. Focusing on combining an £2 data-error norm with TV, 
reformulating Eq. ( [32] ) as a constrained, TV-minimization leads to the following primal problem: 

min {||(|Vu|)||! 4- <W(e)(^ -<?)}= (55) 

where S Ba u^(Au — g) is zero for \\Au — <? 1 1 2 < £■ When e > 0, this problem is equivalent to the unconstrained 
optimization Eq. p2| , see e.g. Ref. 11341 . in the sense that for each positive e there is a corresponding A yielding 
the same solution. For this constrained, TV-minimization, the function F\ is 

F\{y) = $Baii( e ){y - g)- (56) 

The corresponding conjugate is 

*i(p) = e|bl|2 + <P,S>D, (57) 

leading to the dual problem: 

max{-e||p|| 2 - (p,g)D - S Box (i)(\q\) ~ S 0l (divq- A T p)} . (58) 

p,q 

Again, for the algorithm instance we need the proximal mapping prox a [Fl\. 

prox„[F*]{y) = max(\\y - ag\\ 2 ~ ere, 0) (y - ag). (59) 
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The main points in deriving this proximal mapping are discussed in Appendix [C] and it is an example where 
geometric/symmetry arguments play a large role. Listing [7] shows the algorithm instance solving Eq. ( |55] l, where 
once again only line [5] is modified. This algorithm instance essentially achieves the same goal as Listing |4j the only 



Algorithm 7 Pseudocode for A-steps of the li -constrained, TV-minimization CP algorithm instance. 



1: 


L<- ll(AV)|| 2 ; 


t 4— 1/L] <j <— 1/L; <— 1; n 


2: 


initialize 


Uq, Pq, 


and qo to zero values 


3: 


Uq <- Uq 






4: 


repeat 






5: 


Pn+1 * 


- max( 


\\p n + a{Au n - g)\\ 2 - ere, 0) (p„ + a(Au n - g)) 


6: 


q n +i <■ 


- (?„ + 


a\7u n )/ max(l/, \q n + oS7u n \) 


7: 


U n +1 < 


-u n - 


TA T p n+1 + rdivq n+ i 


8: 


U n +1 < 


- u n+1 


+ 0(u n+ i - u n ) 


9: 


n <— n 


+ 1 




10: 


until n> N 





difference is that the parameter e has an actual physical interpretation, being the data-error bound. 

IV. Demonstration of CP algorithm instances for tomographic image reconstruction 

In the previous section, we have derived CP algorithm instances covering many optimization problems of interest 
to CT image reconstruction. Not only are there the seven optimization problems, but within each case the system 
model/matrix A, the data g, and optimization problem parameters can vary. For each of these, practically infinite 
number of optimization problems, the corresponding CP algorithm instances are guaranteed to converge [20 1 . The 
purpose of this results section is not to advocate one optimization problem over another; rather to demonstrate 
the utility of the CP algorithm for optimization problem prototyping. For this purpose, we present example image 
reconstructions that could be performed in a study for investigating the impact of matching the data-divergence 
with data noise model for image reconstruction in breast CT. 

A. Sparse-view experiments for image reconstruction from simulated CT data 

We briefly describe the significance of the experiments, but we point out that the main goal here is to demonstrate 
the CP algorithm instances. Much of the recent interest in employing the TV semi-norm in optimization problems for 
CT image reconstruction has been generated by compressive sensing (CS). CS seeks to relate sampling conditions 
on a sensing device with sparsity in the object being scanned. So far, mathematical results have been limited to 
various types of random sampling ll37ll . System matrices such as those representing CT projection fall outside of 
the scope of mathematical results for CS (8). As a result, the only current option for investigating CS in CT is 
through numerical experiments with computer phantoms. 
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A next logical step for bridging theoretical results for CS to actual application is to consider physical factors in 
the data model. One such factor is a noise model, which can be quite important for low-dose CT applications such 
as breast CT. While much work has been performed on iterative image reconstruction with various noise models 
under conditions of full sampling, little is known about the impact of noise on sparse-view image reconstruction. 
In the following limited study, we set up a breast CT simulation to investigate the impact of correct modeling of 
data noise with the purpose of demonstrating that the CP algorithm instances can be applied to the CT system. 

B. Sparse-view reconstruction with a Poisson noise model 




Fig. 1. Breast phantom for CT and FBP reconstructed image for a 512-view data set with Poisson distributed noise. Left is the phantom in the 
gray scale window [0.95,1.15]; middle is the same phantom with a blow-up on the micro-calcification ROI displayed in the gray scale window 
[0.9,1.8]; and right is the FBP image reconstructed from the noisy data. The middle panel is the reference for all image reconstruction algorithm 
results. The FBP image is shown only to provide a sense of the noise level. 

For the following study, we employ a digital 256 x 256 breast phantom, described in Ref. J23), l38l . and used in 
our previous study on investigating sufficient sampling conditions for TV-based CT image reconstruction [12|. The 
phantom models for tissue types: the background fat tissue is a assigned a value of 1.0, the modeled fibro-glandular 
tissue takes a value of 1.1, the outer skin layer is set to 1.15, and the micro-calcifications are assigned values in 
the range [1.8,2.3]. 

For the present case, we focus on circular, fan-beam scanning with 60 projections equally distributed over a full 
360° angular range. The simulated radius of the X-ray source trajectory is 40cm with a source-detector distance of 
80cm. The detector sampling consists of 512 bins of size 200 microns. The system matrix for the X-ray projection is 
computed by the line-intersection method where the matrix elements of A are determined by the length of traversal 
in each image pixel of each source/detector-bin ray. For this phantom under ideal conditions, we have found that 
accurate recovery is possible with constrained, TV-minimization with as few as 50 projections. In the present study, 
we add Poisson noise to the data model at a level consistent with what might be expect in a typical breast CT 
scan. The Poisson noise model is chosen in order to investigate the impact of matching the data-error term to the 
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noise model. For reference, the phantom is shown in Fig. [T] To have a sense of the noise level, a standard fan-beam 
filtered back-projection reconstruction is shown alongside the phantom for simulated Poisson noise. 

For this noise model, the maximum likelihood method prescribes minimizing the KL data divergence between the 
available and estimated data. To gauge the importance of selecting a maximum likelihood image, we compare the 



results from two optimization problems: a KL data divergence plus a TV-penalty, Eq. (45 1 above; and a least-squares 



data error norm plus a TV-penalty, Eq. ( 32 1 above. With the CP framework, these two optimization problems can 
be easily prototyped: the solutions to both problems can be obtained without worrying about smoothing the TV 
semi-norm, setting algorithm parameters, or proving convergence. 




Fig. 2. Images reconstructed from 60-view projection data with a Poisson distributed noise model. The top row of images result from 
minimizing the l^-TV objective in Eq. |32| for A = 1 X 10~ 4 , 5 X 10~ 5 , and 2 X 10 -5 , going from left to right. The bottom row of images 
result from minimizing the KL-TV objective in Eq. {45} for the same values of A. Note that A does not necessarily have the same impact on 
each of these optimization problems. Nevertheless, we see similar trends for the chosen values of A. 



For the phantom and data conditions, described above, the images for different values of the TV-penalty parameter 
A are shown in Fig. [2] An ROI of the micro-calcification cluster is also shown. The overall and ROI images give 
an impression of two different visual tasks important for breast imaging: discerning the fibro-glandular tissue 
morphology and detection/classification of micro-calcifications. The images show some difference between the two 
optimization problems; most notably there is a perceptible reduction in noise in the ROIs from the KL-TV images. 
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A firm conclusion, however, awaits a more complete study with multiple noise realizations. 

The most critical feature of the CP algorithm that we wish to promote is the rapid prototyping of a convex 
optimization problem for CT image reconstruction. The above study is aimed at a combination of using a data 
divergence based on maximum likelihood estimation with a TV-penalty, which takes advantage of sparsity in 
the gradient magnitude of the underlying object. The CP framework facilitates the use of many other convex 
optimization problems, particularly those based on some form of sparsity, which often entail some form of the non- 
smooth £i-norm. For example, in Ref. [8| we have found it useful for sparse-view X-ray phase-contrast imaging 
to perform image reconstruction with a combination of a least-squares data fidelity term, an l\ -penalty promoting 
object sparseness, and an image TV constraint to further reduce streak artifacts from angular under-sampling. Under 
the CP framework, prototyping various combinations of these terms as constrained or unconstrained optimization 
problems becomes possible and the corresponding derivation of CP algorithm instances follows from the steps 
described in Sec. |II-B| Alternative, convex data fidelity terms and image constraints motivated by various physical 
models may also be prototyped. 

As a practical matter, though, it is important to have some sense of the convergence of the CP algorithm 
instances. To this end, we take an in depth look at individual runs for the KL-TV algorithm instance for CT image 
reconstruction. 



C. Iteration dependence of the CP algorithm 




Fig. 3. (Left) Convergence of the partial primal-dual gap for the CP algorithm instance solving Eq. {45} for different values of A. (Right) 
Plot indicating agreement with condition 1: — A^pHoo, the magnitude of the largest component of the argument of the last indicator 

function of Eq. J48j. Collecting all the indicator functions of the primal, Eq. |45j, and dual, Eq. l |48) , KL-TV optimization problems, we have 
four conditions to check in addition to the conditional primal-dual gap: (1) divq — A T p = 0/, (2) Au > Oo, (3) p < Id, ™d (4) |g| < A. 
The agreement with condition 1 is illustrated in the plot; agreement with condition 2 has a similar dependence; condition 3 is satisfied early on 
in the iteration; and condition 4 is automatically enforced by the CP algorithm instance for KL-TV. Because the curves are bunched together 
in the condition 1 plot, they are differentiated in color. 



Through the methods described above, many useful algorithm instances can be derived for CT image recon- 
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struction. It is obviously important that the resulting algorithm instance reaches the solution of the prescribed 
optimization problem. To illustrate the convergence of a resulting algorithm instance we focus on the TV-penalized 



KL data divergence, Eq. (45 1, and plot the conditional primal-dual gap for the different runs with varying A in Fig. 



[3] Included in this figure is a plot indicating the convergence to agreement with the most challenging condition 



set by the indicator functions in Eq. (48 1. For the present results we terminated the iteration at a conditional 
primal-dual gap of 10 -5 , which appears to happen on the scale of thousands of iterations with smaller A requiring 
more iterations. Interestingly, a simple pre-conditioned form of the CP algorithm was proposed in Ref. Il39ll . which 
appears to perform efficiently for small A. The pre-conditioned CP algorithm instance for this problem is reported 
in Appendix [E] 

V. Discussion 

This article has presented the application of the Chambolle-Pock algorithm to prototyping of optimization 
problems for CT image reconstruction. The algorithm covers many optimization problems of interest allowing 
for non-smooth functions. It also comes with solid convergence criteria to check the image estimates. 

The use of the CP algorithm we are promoting here is for prototyping; namely, when the image reconstruction 
algorithm development is at the early stage of determining important factors in formulating the optimization problem. 
As an example, we illustrated a scenario for sparse-view breast CT considering two different data-error terms. In 
this stage of development it is helpful to not have to bother with algorithm parameters, and questions of whether 
or not the algorithm will converge. After the final optimization problem is determined, then the focus shifts from 
prototyping to efficiency. 

Optimization problem prototyping for CT image reconstruction does have its limitations. For example, in the breast 
CT simulation presented above, a more complete conclusion requires reconstruction from multiple realizations of 
the data under the Poisson noise model. Additional important dimensions of the study are generation of an ensemble 
of breast phantoms and considering alternate image representations/projector models. Considering the size of CT 
image reconstruction systems and huge parameter space of possible optimization problems, it is not yet realistic 
to completely characterize a particular CT system. But at least we are assured of solving isolated setups and it is 
conceivable to perform a study along one aspect of the system, i.e. consider multiple realizations of the random data 
model. Given the current state of affairs for optimization-based image reconstruction, it is crucial that simulations 
be as realistic as possible. There is great need for realistic phantoms, and data simulation software. 

We point out that it is likely at least within the immediate future, that optimization-based image reconstruction will 
have to operate at severely truncated iteration numbers. Current clinical applications of iterative image reconstruction 
often operate in the range of one to ten iterations, which is likely far too few for claiming that the image estimate 
is an accurate solution to the designed optimization problem. But at least the ability to prototype an optimization 
problem can potentially simplify the design phase by separating optimization parameters from algorithm parameters. 
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Appendix A 
Computing the norm of K 

The matrix norm used for the parameter L in the CP algorithm instances is the largest singular value of K. 
This singular value can be obtained by the standard power method specified in Listing [8] When K represents 

Algorithm 8 Pseudocode for A-steps of the generic power method. The scalar s tends to \\K 1 1 2 as N increases. 
1: initialize xq € I to a non-zero image 

2: n <- 

3: repeat 

4: X n+ i <- K T Kx n 

5: X n+ i ^ X n+ i/\\x n+ i\\ 2 

6: S <r- \\Kx n+1 \\ 2 

i: n <- n + 1 

8: until n> N 



the discrete X-ray transform, our experience has been that the power method converges to numerical precision in 
twenty iterations or less. In implementing the CP algorithm instance for TV-penalized minimization, the norm of 
the combined linear transform V)||2 is needed. For this case, the program is the same as Listing [8] where 
K T Kx n becomes A T Ax n — divVx n ; recall that — div = V T . Furthermore, to obtain s, the explicit computation 

is s = y/\\Ax n+1 \\l + jjVx^ijjf. 

Appendix B 

The convex conjugate of certain indicator functions of interest illustrated in one-dimension 

This appendix covers the convex conjugate of a couple of indicator functions in one dimension, serving to 
illustrate how geometry plays a role in the computation and to provide a mental picture on the conjugate of higher 
dimensional indicator functions. 

Consider first the indicator Sp(x), which is zero for x > 0. The conjugate of this indicator is computed from: 

Sp(x) — max 4>{x') = max {x'x — Sp(x')} . (60) 

x' x' 

To perform this maximization, we analyze the cases, x < and x > 0, separately. As a visual aid, we plot the 
objective for these two cases in Fig. |4] From this figure it is clear that when x < 0, the objective's maximum is 
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0(x') 



ri(x') 



x < 



x > 



Fig. 4. Illustration of the objective function, labeled (/>(x'), in the maximization described by Eq. j60) . Shown are the two cases discussed 
in the text. 



attained at x' = and this maximum value is (note that this is true even for x — 0). When x > 0, the objective 
can increase without bound as x' tends to oo, resulting in a maximum value of oo. Putting these two cases together 
yields: 

5* P {x) = S P (-x). 
Generalizing this argument to multi-dimensional x, yields Eq. <[28]>. 



4>{x') 



x>0 



x<0 



x' -1 




Fig. 5. Illustration of the objective function, labeled in the maximization described by Eq. j6l) . Shown are the two cases discussed 

in the text. 



Next we consider Sb x(i)( x )^ which in one dimension is the same as 8 Baii(i)( x ) ■ This functions is zero only for 
— 1 < x < 1. Its conjugate is computed from: 

S Box(l)( x ) = m ^ X = maX { X ' X ~ Sbox(1)( x ')} ■ (61) 

Again, we have two cases, x < and x > 0, illustrated in Fig. [5] In the former case the maximum value of the 
objective is attained at x' = — 1, and this maximum value is —x. In the latter case the maximum value is x, and it 
is attained at x' = 1. Hence, we have: 

S *Box(l)( X ) = 14 
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For multi-dimensional x, Sbox(i)( x ) ^Ball(i){ x )< an d this is also reflected in the conjugates: 

S *Box(l)( x ) = N|l, 

5 *Bail{i)( x ) = INIa- 





<f>(x') 

X < -1 


4>(x') 

-1 < X < 1 


1 < X 


4>(x>) 


x 1 


\ ^ 


\ x ' 1 





Fig. 6. Illustration of the objective function, labeled (j>(x'), in the maximization described by Eq. {62} . Shown are the three cases discussed 
in the text. 



It is also interesting to verify that $ B * ox ^{x) is indeed §Box(i)( x ) by showing, again in one dimension, that 
\ x \* = ^Boxfi)^)- Illustrating this example helps in understanding the convex conjugate of multi-dimensional 
£i-based semi-norms. The relevant conjugate is computed from: 

|x|* = max 4>{x') = max {x'x — \x'\} . (62) 

x' x' 

Here, we need to analyze three cases: x <—l, —1 <x <1, and x > 1. The corresponding sketch is in Fig. [6] The 
— |x| term in the objective makes an upside-down wedge, and the x'x term serves to tip this wedge. In the second 
case, the wedge is tipped, but still opens up downward so that the objective is maximized at x' — 0, attaining 
there the value of 0. In the first and third cases, the wedge is tipped so much that part of it points upward and the 
objective can increase without bound, attaining the value of oo. Putting these cases together does indeed yield: 

M* = 3box(1)( x )- 

Similar reasoning is used to obtain Eq. (|39|) from Eq. ([38|. 



Appendix C 

Computation of important proximal mappings 

This appendix fills in important steps in computing some of the proximal mappings in the text, where it is 
necessary to use geometrical reasoning in addition to setting the gradient of the objective to zero. 



The conjugate of the TV semi-norm in Eq. (39 1 leads to the following proximal mapping computation: 



prox a [F*](z) = argmin j S Box (X)(W\) 



2a 
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where z,z' £ V, and absolute value, | • |, of a spatial-vector image V yields an image, in /, of the spatial -vector- 
magnitude. The quadratic term is minimized when z — z' , but the indicator function excludes this minimizer when 
z <=£ Box(X). To solve this problem, we write the quadratic as a sum over pixels: 



z 



2(7 2(7 

where i indexes the image pixels and each Z{ and z[ is a spatial-vector. The indicator function places an upper 
bound on the magnitude of each spatial-vector \z'^\ < A. The proximal mapping is built pixel-by-pixel considering 
two cases: if \zi\ < A, then prox a [F*](z)i = zf, if \zi\ > A, then z[ is chosen to be closest to z; while respecting 
\z[\ < A which leads to a scaling of the magnitude of z, t and prox a [F*](z)i — Xzi/\z,i\. Note that the constant 
a does not enter into this calculation. Putting the cases and components all together yields the second part of the 
proximal mapping in Eq. $44). 



For the KL-TV problem the proximal mapping for the data term is computed from Eq. (47 1: 

prox a [F*}(p) = argmin | fc^Ja _ ^ [glnpos(l D - p% + 5 P (1 D . 

We note that the objective is a smooth function in the positive orthant of p' E D. Accordingly, we differentiate the 
objective with respect to p' ignoring the pos(-) and indicator functions, keeping in mind that we have to check that 
the minimizer p' is non-negative. Performing the differentiation and setting to zero yields the following quadratic 
equation: 

p' 2 - (1 D + p)p' + p - ag = 0, 
and substituting into the quadratic equation yields: 



\{l D +p±^{l D -p) 2 +Aag). 



P 2 

We have two possible solutions, but it turns out that applying the restriction — p' > selects the negative root. 
To see this, we evaluate 1^ — p' at both roots: 

Id ~P' = \{1d ~ P T V^CTd - P) 2 + ^9)- 
Using the fact that the data are non-negative, we have 



V(l D -p) 2 +4a 9 > \\ D -p\; 
the positive root clearly leads to possible negative values for — p' while the negative root respects 1^ — p' > 



and yields Eq. (49 1. 

For the final computation of a proximal mapping, we take a look at the data term of the constrained, TV- 



minimization problem. From Eq. (57 1, the proximal mapping of interest is evaluated by: 

prox a [F*](p) = argmin ( — P — +e||p'|| 2 + {p ,9)d 

p> L 2cr 

Note the first term in the objective is spherically symmetric about p and increasing with distance from p, and the 
second term is also spherically symmetric about 0^ and increasing with distance from 0^. If just these two terms 
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were present the minimum would lie on the line segment between o and p. The third term, however, complicates 
the situation a little. We note that this term is linear in p', and it can be combined with the first term by completing 
the square. Performing this manipulation and ignoring constant terms (independent of p') yields: 

r™„ s • / \\P~P' ~ C5ll2 , II fu 

proXvlF-tKp) = argmm < h e\\p \\ 2 



2^ 

By the geometric considerations discussed above, the minimizer lies on the line segment between Op and p — erg. 



Analyzing this one-dimensional minimization leads to Eq. (59 1 



Appendix D 

The finite differencing form of the image gradient and divergence 

In this appendix we write down the explicit forms of the finite differencing approximations of V and — div in 
two dimensions used in this article. We use x E I to represent an M x M image and Xi j to refer to the (i,j)th 
pixel of x. To specify the linear transform V, we introduce the differencing images A s a; G / and A t x G /: 



A s Xij 



Using these definitions, V can be written as: 



x i+l.j ~ x i,j i < M 



-Xi t j i = M 



Xi,j+i ~ Xi,j j < M 
-Xij j = M 




Vx = 

With this form of V, its transpose — div becomes: 

-div = {-(A s Xij - AsXi-ij) - {A t x itj - AtXi j-x), i and j e [1, M]} , 

\ A t x J 

where the elements referred to outside the image border are set to zero: A s xa.j ~ A s Xi y o = A t xoj = A t Xi_o = 0. 
What the particular form of V is in its discrete form is not that important, but it is critical that the discrete forms 
of —div and V are the transposes of each other. 

Appendix E 

Preconditioned Chambolle-Pock algorithm demonstrated on the KL-TV optimization problem 



Chambolle and Pock followed their article, Ref. [20|, with a pre-conditioned version of their algorithm that suits 
our purpose of optimization problem proto-typing while potentially improving algorithm efficiency substantially for 
the £|-TV and KL-TV optimization problems with small A. The new algorithm replaces the constants a and r with 
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vector quantities that are computed directly from the system matrix K, which yields a vector in space Y from a 
vector in space X. One form of the suggested, diagonal pre-conditioners uses the following weights: 

E =wk- m) 

where £ E Y, T e X, and \K\ is the matrix formed by taking the absolute value of each element of K. In order 
to generate the CP algorithm instance incorporating pre-conditioning, the proximal mapping needs to be modified: 

proxx[F](y) = arg min + ±(y - y') T (^^T") } ' (65) 

The second term in this minimization is still quadratic but no longer spherically symmetric. The difficulty in deriving 
the pre-conditioned CP algorithm instances is similar to that of the original algorithm. On the one hand there is no 
need for finding \\K H2, but on the other hand deriving the proximal mapping may become more involved. For the 
^2-TV and the KL-TV optimization problems, the proximal mapping is simple to derive and it turns out that the 
mappings can be arrived at by replacing a by £ and r by T. 

The gain in efficiency for small A comes from being able to absorb this parameter into the TV term and allowing 
£ to account for the mismatch between TV and data agreement terms. We modify the definitions of V and — div 
matrices from Appendix |D| 

AA s x 
XA t x 



and 



1 

-divx \ I = {-\(A s x id - A s ajj_i^) - X(A t x itj - A t x hJ ^i),i and j G [1,M]} , 



At* 

where again the elements referred to outside the image border are set to zero: A s xoj = A s Xi t o = A t xoj = 
A t Xifl = 0. 

For a complete example, we write the pre-conditioned CP algorithm instance for KL-TV in Listing [9] To illustrate 
the potential gain in efficiency, we show the condition primal-dual gap as a function of iteration number for the 
KL-TV problem with A = 2 x 1CT 5 in Fig. [7] While we have presented the pre-conditioned CP algorithm as a 
patch for the small A case, it really provides an alternative prototyping algorithm and it can be used instead of the 
original CP algorithm. 
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iterations iterations 



Fig. 7. (Left) Convergence of the partial primal-dual gap for the CP algorithm instance solving Eq. {45} for A = 2 X 10~ 5 for the original 
and pre-conditioned CP algorithm. (Right) Plot indicating agreement with condition 1 for the KL-TV optimization problem. See Fig. [3] for 
explanation. 

Algorithm 9 Pseudocode for TV-steps of the KL-TV pre-conditioned CP algorithm instance. 

l: Ei <- 1 D /(|A|1/); S 2 <- l y /(|V A |l/); T^l!/{\A T \1 D + \div x \l v ) 
2: 6 <- 1; n <- 

3: initialize uq, po, and qo to zero values 

4: U <- tt 

5: repeat 

6: p n+ i <- | (l_D + + Si^« n - (p„ + Ei^4w„ - 1 D ) 2 + 4Si^ 

7: «- (q n + E 2 Vam„)/ max(lj, |g n + E 2 Vau„|) 

- TA T p n+1 + Tdiv x q n +i 

9: U n+1 <- U n+ i + 0(u n+ i - U n ) 

10: n <- n + 1 
11: until n> N 
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